The latest version of R installed (R version 3.5.2: https://cran.rstudio.com/). This is very important.
Preferably Rstudio v.1.2 (https://www.rstudio.com/products/rstudio/download/preview/). Note that this version is not officially released yet. If you don’t have it / don’t want to install it, no big deal, you’ll still be able to do 98% of the workshop.
install.packages("knitr")
install.packages("rmarkdown")
install.packages("rticles")
install.packages("tinytex")
tinytex::install_tinytex() #Please run this command as well to install a LaTeX distribution This may take a few minutes to install (~150MB).
You can download the workshop material on GitHub .
Don’t hesitate to comment (new issues), or request changes (pull request) on GitHub .
Follow the .html (web browser) and the .Rmd (R studio) documents. Try and experiment.
~2 hours: Introduction and practice
~15 minutes: pause
~1 hour: other formats (.docx, .pdf, and others). Workshop here
~15 minutes: shiny. Workshop here
Markdown is a lightweight markup language with plain text formatting syntax (Easy-to-read, easy-to-write plain text format). It is designed so that it can be easily converted to HTML and many other formats (e.g. PDF, MS Word, .docx).
Like other markup languages (e.g. HTML and Latex), it is completely independent from R.
Typically, file have the extension .md.
Look at this example. Examine the html render (GitHub automatically interprets .md files) and raw file.
An extension of the markdown syntax that enables R code to be embedded and executed.
Generate fully reproducible reports in different static and dynamic output formats.
Most of these packages are maintained by the R studio team (https://rmarkdown.rstudio.com/, Yihui Xie)
Plain text files that typically have the file extension .Rmd.
Write text & code in R studio.
Knit: The R package rmarkdown feeds the .Rmd file to the R package knitr.
knitr executes code and creates a new markdown (.md) document which includes the code and output.
Subsequently tranformed into .html/.tex/.docx by pandoc. (Note that .tex files need to be transformed by pdflatex into .pdf files. We’ll come back to that later.)
Pandoc is an universal document converter, independent of R.
By default, Rstudio comes with rmarkdown, knitr, and pandoc (but not pdflatex).
When you click the Knit button (top left), a document will be generated that includes both content as well as the output of R code within the document. You can also use the render() function.
R studio
file > new file > R markdown > HTML
Save it (“myfirstRmarkdown.Rmd”)
Knit
Examine the .html output.
Examine at the .Rmd file structure.
Markdown provides an easy way to make standard types of formatted text, like:
italics (*text*) or italics (_text_)
bold (**bold**)
backslash (\) to interpret a special characters as character
“# and space” at the beginning of line for a header level (6 levels, # to ######)
bold italic (_**bold italic**_)
links ([links](https://www.rmarkdown.rstudio.com))
(<!–comments–>)
Two spaces or two carriage return for a newline character
*** for an horizontal line
quoted text (`quoted text`)
> Quoted text: 1st way
> more quoted text
> still more quoted text
Quoted text: 1st way
more quoted text
still more quoted text
`Quoted text: 2nd way`
`more quoted text`
`still more quoted text`
Quoted text: 2nd way
more quoted text
still more quoted text
```
text: 3rd way
quoted text
more quoted text
```
Quoted text: 3rd way
more quoted text
still more quoted text
Species | Counts
——— | —–
H. sapiens | 24
M. musculus | 442
| Species | Counts |
|---|---|
| H. sapiens | 24 |
| M. musculus | 442 |
You can use this wikipedia text and list of roses subgenera as an example to reproduce.
Convert the document to a html webpage.
---
title: "Rmarkdown"
author: "Sebastien Renaut"
date: '2018-03-12'
output: html_document
---
Header, metadata, YAML, YAML Ain’t Markup Language (https://en.wikipedia.org/wiki/YAML#History_and_name) ?
Header specifies configurations (what kind of document will be created, and the options chosen).
It is not required (defaults then apply).
It uses Python-style indentation to specify certain options.
Many options possible depending what type of document you are generating. See below for some examples.
Note that some options can be specified either for the whole document (in the header), the code chunks, or both (chunks options supersede header). More on code chunks later.
---
title: "Rmarkdown"
author: "Sebastien Renaut"
date: "March 06, 2019"
output:
html_document:
highlight: tango
number_sections: T
theme: cerulean
toc: yes
toc_depth: 3
---
Note the indentation in the .Rmd document for the output options.
Note that date is population via an R function.
See the official R markdown lessons for more information. But these are some formats of interest:
output: html_document
output: ioslides_presentation
output: pdf_document (This will require that you have a Latex software installed - We’ll get to that later).
output: word_document (.docx)
interactive shiny apps (We’ll get to that later as well).
toc: yes Generate TOC.
toc_depth:3 depth of TOC.
number_sections:T Add section numbering to headers. If you do not want a certain heading to be numbered, you can add {-} or {.unnumbered} after the heading, e.g.,
theme: specifies the theme to use for the page (“cerulean”, “journal”, “flatly”, “readable”, “spacelab”, “united”, and “cosmo”).
highlight: Code syntax highlighting style (e.g. “tango”, “pygments”, “kate”, “zenburn”).
See the cheatsheet and official R markdown book for more options.
Rmarkdown documentoutput: word_document)The real power of R Markdown comes from mixing markdown syntax with chunks of code.
A code chunk is intepreted by knitr. It works essentially the same as the R syntax we are familiar with.
A main code chunks may look like this:
```{r example, include = T, message = T, warning=T, echo = F, fig.cap="A figure of random points"}
#Running some R code.
x = rexp(1000)
min(x)
max(x)
plot(x)
```
## [1] 0.0002072449
## [1] 9.006855
A figure of random points
On the 1st line, I specify that I will run the R programming language.
Then, I give the chunk a UNIQUE name and specify options.
There are a large number of chunk options in knitr documented here.
include = FALSE: Code and results will NOT appear in the finished file. Code is still interpreted, and the results can be used by other chunks.
echo = F prevents code, but not results from appearing in the finished file. This is a useful way to embed figures.
message = F prevents messages that are generated by code from appearing in the finished file.
warning = F prevents warnings that are generated by code from appearing in the finished file.
fig.cap = "..." adds a caption to graphical results.
fig.width=..., fig.height=... can also change figure width/heigth.
By default R studio creates a Global Options code chunk. Let’s examine this chunk:
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
see cheat sheet for more info.
Note that you also run inline code using the using the ` ` symbols and specifying the programming language. For example, ` r 10+5 ` would be processes as 15.
R markdown can read and execute different languages!
## rmarkdown_main.Rmd
## rmarkdown_main.html
## rmarkdown_main.log
## ['hello', 'python!']
## Hello perl!
Mathematical material is set off by the use of single dollar-sign characters (similar as in the LaTeX typesetting language).
So to write \(E = mc^{2}\), you’d write: $E = mc^{2}$
\(\sum_{i=1}^n ASV\)
\(F_{(1,69)}\) = 1.27, p-value=0.26
\(A = \pi*r^{2}\)
\(\sqrt{b^2 - 4ac}\)
If you wish to use a dollar sign, you need to preface it with a back-slash \(E = mc^{2}\) versus $E = mc^{2}$
The use of double dollars quote allows for displayed formulas (centered). \[\sqrt{b^2 - 4ac}\]
See more example equations from this McGill math R markdown tutorial.
There are several ways to include figures.
Can be included from an URL directly uploaded from the web:
{width=250px}
If this figure is small, it can be added to the text directly: eg.: Today, we are using to generate webpages with
images…
This is a graph previously saved in the figures directory
{width=250px}
In all these cases, graphs are rendered with pandoc and not knitr, so pandoc options need to be specified, not knitr R graphics options:
It’s simple, but options can be tricky.
You may need to play with spacing, figure size, and figure position.
Options are specified directly after the URL or link (eg. {width=250px} or {width=50%}).
Images can also be interpreted by knitr as below:
```{r graphic_example, out.width = "20%", fig.cap = "rosa_banksiae", echo = F,fig.align = "center"}
knitr::include_graphics("../figures/rosa_banksiae.JPG")
```
rosa_banksiae.JPG
```{r roses, out.width = "50%",echo = F,out.extra='style="float:right; padding:10px"'}
knitr::include_graphics("../figures/rosa_banksiae.JPG")
```
The genus Rosa is subdivided into four subgenera:
Graphs can also be generated directly by R code, specified in a code chunk (R options specified in the code chunk) and interpreted by knitr as we did previously.
```{r another example, echo = F, message = F}
library(ggplot2)
mtcars_ggplot = ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point() + geom_smooth()
mtcars_ggplot
```
```{r out.width=c('50%', '50%'), fig.show='hold',echo=F}
mtcars_ggplot
plot(rnorm(10))
```
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
By default, R Markdown displays data frames and matrices as they would be in the R terminal (in a monospaced font).
You can use the knitr::kable function for additional formatting, as in the .Rmd file below.
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
#With kable function from knitr (better looking)
knitr::kable(head(mtcars),digits =1,caption = "A motorcars table")| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.9 | 2.6 | 16.5 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.9 | 2.9 | 17.0 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108 | 93 | 3.8 | 2.3 | 18.6 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.1 | 3.2 | 19.4 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.1 | 3.4 | 17.0 | 0 | 0 | 3 | 2 |
| Valiant | 18.1 | 6 | 225 | 105 | 2.8 | 3.5 | 20.2 | 1 | 0 | 3 | 1 |
Find a picture on the web. Save it.
Add it to document using either knitr or pandoc.
Repeat exercice by linking directly with figure.
Add a table using knitr.
[^1] and add reference at the end using this format: [^1]: Renaut 2019. R markdown footnote. Number 1. pp1-2.Otherwise, you may specify a bibliography and citation style by adding these two lines in the header.
csl: peerj.csl
bibliography: test_library.bib
The Citation Style Language (.csl) file specifies the reference format.
It is an open XML-based language that describe the formatting of citations and bibliographies. Reference management programs using .csl include Zotero, Mendeley and Papers3.
Most journals should have a .csl file be on this GitHub repo. But you could create your own.
@article{altschul1997gapped,
title={Gapped BLAST and PSI-BLAST: a new generation of protein database search programs},
author={Altschul, Stephen F and Madden, Thomas L and Sch{\"a}ffer, Alejandro A and Zhang, Jinghui and Zhang, Zheng and Miller, Webb and Lipman, David J},
journal={Nucleic acids research},
volume={25},
number={17},
pages={3389--3402},
year={1997},
publisher={Oxford University Press}
}
Here, I created a .bib file (../biblio/test_library.bib) in the reference management software Papers3.
I often copy .bib references directly from Google Scholar and add it to a .bib database text file.
The bioinformatics program BLAST (Altschul et al., 1997) has been cited nearly 70,000 times. These are three random references (eg. Thibert-Plante & Hendry, 2010; Wagner et al., 2012; Yoshida et al., 2014).
Citations go inside square brackets [ ] and are separated by semicolons (;).
Each citation must have a unique key, composed of ‘@’ + the citation identifier from the .bib database file.
A minus sign (-) before the @ will suppress mention of the author in the citation. This can be useful when the author is already mentioned in the text. For example, Stephen Altschul and a bunch of other people (1997) have been cited 70,000 times.
Find 3 references in Google Scholar. Copy bibTeX references to a text file. Save it with a .bib extension.
Find, save as text file (.csl extension) and use another Citation Style Language from this GitHub repo (e.g. Nature, PLOS ONE, Indian Journal Of Dermatology, etc.). (hint: type ‘t’ in GitHub repo linked in Section 10 to activate search function).
(Note that references below are generated automatically, except for the footnote.)
Altschul SF., Madden TL., Schäffer AA., Zhang J., Zhang Z., Miller W., Lipman DJ. 1997. Gapped blast and psi-blast: A new generation of protein database search programs. Nucleic acids research 25:3389–3402.
Thibert-Plante X., Hendry A. 2010. The consequences of phenotypic plasticity for ecological speciation. Journal Of Evolutionary Biology:1–17.
Wagner CE., Keller I., Wittwer S., Selz OM., Mwaiko S., Greuter L., Sivasundar A., Seehausen O. 2012. Genome-wide RAD sequence data provide unprecedented resolution of species boundaries and relationships in the Lake Victoria cichlid adaptive radiation. 22:787–798.
Yoshida K., Makino T., Yamaguchi K., Shigenobu S., Hasebe M., Kawata M., Kume M., Mori S., Peichel CL., Toyoda A., Fujiyama A., Kitano J. 2014. Sex Chromosome Turnover Contributes to Genomic Divergence between Incipient Stickleback Species. PLoS Genetics 10:e1004223.
Renaut 2019. R markdown footnote. Number 1. pp1-2↩